14 research outputs found

    Quantifying origin and character of long-range correlations in narrative texts

    Full text link
    In natural language using short sentences is considered efficient for communication. However, a text composed exclusively of such sentences looks technical and reads boring. A text composed of long ones, on the other hand, demands significantly more effort for comprehension. Studying characteristics of the sentence length variability (SLV) in a large corpus of world-famous literary texts shows that an appealing and aesthetic optimum appears somewhere in between and involves selfsimilar, cascade-like alternation of various lengths sentences. A related quantitative observation is that the power spectra S(f) of thus characterized SLV universally develop a convincing `1/f^beta' scaling with the average exponent beta =~ 1/2, close to what has been identified before in musical compositions or in the brain waves. An overwhelming majority of the studied texts simply obeys such fractal attributes but especially spectacular in this respect are hypertext-like, "stream of consciousness" novels. In addition, they appear to develop structures characteristic of irreducibly interwoven sets of fractals called multifractals. Scaling of S(f) in the present context implies existence of the long-range correlations in texts and appearance of multifractality indicates that they carry even a nonlinear component. A distinct role of the full stops in inducing the long-range correlations in texts is evidenced by the fact that the above quantitative characteristics on the long-range correlations manifest themselves in variation of the full stops recurrence times along texts, thus in SLV, but to a much lesser degree in the recurrence times of the most frequent words. In this latter case the nonlinear correlations, thus multifractality, disappear even completely for all the texts considered. Treated as one extra word, the full stops at the same time appear to obey the Zipfian rank-frequency distribution, however.Comment: 28 pages, 8 figures, accepted for publication in Information Science

    Characteristics of the context-driven meta-modeling paradigm (CDMM-P)

    No full text
    The paper introduces a novel Context-Driven Meta-Modeling Paradigm (CDMM-P) and discusses its properties. The CDMM-P changes the traditional division of responsibilities within the data layer in software systems. It facilitates the interchangeable usage of both objects representing data and objects representing relationships. The decomposition of specific responsibilities results in the weakening of internal data model dependencies. This in turn allows for run-time construction of the whole data model. The proposed paradigm facilitates exceptional flexibility in the implementation of the data layer in software systems. It may be applied to domain modeling in enterprise applications as well as to the modeling of any ontology, including the construction of modeling and meta-modeling languages. As such, CDMM-P underpins a broad domain of Context-Driven Meta-Modeling Technology (CDMM-T)

    Parallelization of the Levenshtein distance algorithm

    No full text
    This paper presents a method for the parallelization of the Levenshtein distance algorithm deployed on very large strings. The proposed approach was accomplished using .NET Framework 4.0 technology with a specific implementation of threads using the System. Threading.Task namespace library. The algorithms developed in this study were tested on a high performance machine using Xamarin Mono (for Linux RedHat/Fedora OS). The computational results demonstrate a high level of efficiency of the proposed parallelization procedure

    Mechanizm identyfikacji i klasyfikacji tre艣ci

    No full text
    This paper presents the mechanism of identification and classification of content, based on terms weighted method with inversed document frequency analysis and Levenstein distance technique. The proposed mechanism is applied in the analysis of topics and descriptions of selected diploma thesis, to automatic selection of supervisors and reviewers.Artyku艂 opisuje mechanizm identyfikacji i klasyfikacji tre艣ci, oparty na metodzie wa偶enia termin贸w, bazuj膮cej na odwrotnej cz臋sto艣ci dokumentowej, cz臋sto艣ci wyst膮pienia terminu i odleg艂o艣ci Levenshteina. Zaproponowany mechanizm zaimplementowano w program analizuj膮cy tematy i opisy prac dyplomowych, w celu automatycznego doboru promotor贸w i recenzent贸w
    corecore